Inverted Index Support for Numeric Search
نویسندگان
چکیده
Today’s search engines are increasingly required to broaden their capabilities beyond free-text search. More complex features, such as supporting range constraints over numeric data, are becoming common; structured search over XML data will soon follow. This is particularly true in the enterprise search domain, where engines attempt to integrate data from the Web and corporate knowledge portals with data residing in proprietary databases. In this paper we extend previous schemes by which an inverted index based search engine can efficiently support queries that contain numeric restrictions in addition to standard, free-text portions. Furthermore, we analyze both the known schemes and our extensions in terms of index-build time, index space and query processing time. We show how to maximize query processing performance while respecting limits on index size and build time, or conversely, how to minimize index space and build time while maintaining guarantees on runtime performance. Thus, we concisely analyze the trade-off between index size and build time, and runtime performance. Finally, we present experimental results that demonstrate significant performance benefits attained by our method, as compared to alternative approaches.
منابع مشابه
A Generic Inverted Index Framework for Similarity Search on the GPU - Technical Report
Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general paralleliz...
متن کاملParallel Search Using Partitioned Inverted Files
We examine the search of partitioned inverted files with particular emphasis on issues that arise from different types of partitioning methods. Two types of index partitions are investigated: namely Termld and Docld. We describe the search operations implemented in order to support parallelism in probabilistic search. We also describe higher level features such as search topologies in parallel ...
متن کاملAccess-Ordered Indexes
Search engines are an essential tool for modern life. We use them to discover new information on diverse topics and to locate a wide range of resources. The search process in all practical search engines is supported by an inverted index structure that stores all search terms and their locations within the searchable document collection. Inverted indexes are highly optimised, and significant wo...
متن کاملInverted indexes: Types and techniques
There has been a s ubstantial amount of research on high performance inverted index because most web and search engines use an inverted index to execute queries. Documents are normally stored as lists of words, but inverted indexes invert this by storing for each word the list of documents that the word appears in, hence the name “inverted index”. This paper presents the crucial research findin...
متن کامل3D Inverted Index with Cache Sharing for Web Search Engines
Web search engines achieve efficient performance by partitioning and replicating the indexing data structure used to support query processing. Current practice simply partitions and replicates the text collection on the set of cluster processors and then constructs in each processor an index data structure. This paper proposes a different approach by constructing an index data structure that pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Internet Mathematics
دوره 3 شماره
صفحات -
تاریخ انتشار 2007